-
Notifications
You must be signed in to change notification settings - Fork 9
ENH: disallow numpy generics #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
) | ||
def test_binary_operators_vs_numpy_float(op): | ||
""" | ||
np.float64 is a subclass of float and must be allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to admit I'm not a fan of this. This is technically correct, it is a subclass. That said, it is an implementation detail IMO.
I'd think that all instances where numpy returns a numpy scalar, all other libraries return 0D arrays, so I'd think we better always handle them together with arrays not python scalars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that
x = xp.asarray(1, dtype=xp.float32)
y = np.float64(2)
z = xp.maximum(x, y)
maximum
should crash instead of returning an Array[float32]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not crash :-).
I think the following is correct[ because np.float64(2)
is conceptually a np.asarray(2.)
not python scalar 2.)
In [4]: xp = array_namespace(np.empty(2))
In [5]: xp.maximum(xp.asarray(1, dtype=xp.float32), 2)
Out[5]: np.float32(2.0)
In [6]: xp.maximum(xp.asarray(1, dtype=xp.float32), np.float64(2))
Out[6]: np.float64(2.0)
Likewise, the following is also correct, for the same reason:
In [12]: xpt = array_namespace(torch.empty(2))
In [13]: xpt.maximum(xpt.asarray(1, dtype=xpt.float32), np.float64(2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 xpt.maximum(xpt.asarray(1, dtype=xpt.float32), np.float64(2))
File ~/repos/array-api-compat/array_api_compat/torch/_aliases.py:91, in _two_arg.<locals>._f(x1, x2, **kwargs)
88 @_wraps(f)
89 def _f(x1, x2, /, **kwargs):
90 x1, x2 = _fix_promotion(x1, x2)
---> 91 return f(x1, x2, **kwargs)
TypeError: maximum(): argument 'other' (position 2) must be Tensor, not numpy.float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [6]: xp.maximum(xp.asarray(1, dtype=xp.float32), np.float64(2))
Out[6]: np.float64(2.0)
This looks definitely wrong to me. array-api-strict shouldn't quietly accept a numpy array.
Right now the output is the expected one without explicitly disallowing inheritance:
>>> xp.maximum(xp.asarray(1, dtype=xp.float32), np.float64(2))
Array(2., dtype=array_api_strict.float32)
>>> xp.maximum(xp.asarray(1, dtype=xp.float32), np.asarray(2, dtype=np.float64))
TypeError: Only real numeric dtypes are allowed in maximum(...). Got array_api_strict.float32 and float64.
>>> xp.maximum(xp.asarray(1, dtype=xp.int32), np.int64(2))
TypeError: Only real numeric dtypes are allowed in maximum(...). Got array_api_strict.int32 and int64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the message of the TypeError is a bit misleading; it should call out explicitly something like "only Array and python scalars are accepted; got numpy.ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks definitely wrong to me. array-api-strict shouldn't quietly accept a numpy array.
Right now the output is the expected one without explicitly disallowing inheritance:
Yes, in my example xp
is array_api_compat.numpy
. As your example shows, array-api-strict does the right thing in main, too. So what is this PR fixing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR does two things:
- adds a test for binary dunders reject arrays on different devices
- adds tests which codify the behavior on binary dunders with numpy scalars.
The former is great IMO, the latter is less so. Essentially, I'd consider it a bug that np.float32(1.)
behaves differently from np.float64(1.)
. Either they both should duck-type as the built-in float
or they both should duck-type as a 0D array of the corresponding dtype.
To tell which behavior is better, consider how a numpy scalar might come from to array-api-strict. I see two possible scenarios:
- a user explicitly creates an
np.int64(python_int)
ornp.float32(python_float)
. Old code does that sometimes. IMO the user code should decide if they need a builtin object or a 0D array, and change accordingly. - some numpy operation (indexing or a ufunc, for instance) creates a numpy scalar. In those cases any other backend would have created a 0D array. Therefore,
array-api-strict
can have two possible behaviors: reject all numpy scalars or convert them into 0D arrays.
Rejecting numpy scalars is probably not very useful: it would require that all user code explicitly calls asarray
just make user tests pass on both numpy and array-api-strict
.
Converting them all sounds more useful for downstream testing, which is the main purpose of array-api-strict
.
Therefore, I suggest that instead of codifying the existing bug, we add a test or several that all instances of np.generic
duck-type with arrays in both binary functions and binary dunders, and actually make it happen: I think it will require something like if isinstance(x, np.generic): x = xp.asarray(x)
before checking isinstance(x, bool | int| float | complex)
.
else: | ||
raise TypeError(f"Expected Array | python scalar; got {type(other)}") | ||
elif not isinstance(other, bool | int | float | complex): | ||
raise TypeError(f"Expected Array or Python scalar; got {type(other)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOte to self: the changes to _check_device
are just cosmetics plus a rename, the functionality is the same.
"__rshift__": "integer", | ||
"__sub__": "numeric", | ||
"__truediv__": "floating-point", | ||
"__xor__": "integer or boolean", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: the values here and in unary_op_types
below correspond to dtype categories from _dtypes.py
, so these two dicts are similar to those in https://github.com/data-apis/array-api-strict/blob/main/array_api_strict/tests/test_elementwise_functions.py#L36
a = asarray(1) | ||
i64 = np.int64(1) | ||
with pytest.raises(TypeError, match="Expected Array or Python scalar"): | ||
getattr(a, op)(i64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this test basically tests the status quo, the behavior is the same on main.
for op in binary_op_dtypes: | ||
assert isinstance(func(f64), Array) | ||
with pytest.raises(TypeError, match="Expected Array or Python scalar"): | ||
func(f32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test also reflects the status quo: the behavior is exactly the same on main.
a = asarray(1, dtype=dtype, device=CPU_DEVICE) | ||
b = asarray(1, dtype=dtype, device=Device("device1")) | ||
with pytest.raises(ValueError, match="different devices"): | ||
getattr(a, op)(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would be great to extend to binary functions, too. I'd expect them to work just the same, because they reuse the ._promote_scalar
logic. If xp.add(a, b)
differs from a.__add__(b)
, it's a bug, so would be great to test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out extending to binary functions caused the scope blow up, so I moved it to a separate PR: #148
I've reworked this PR to disallow np.float64 and np.complex128 from being treated as float and complex in Array methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
There is a non-zero chance that this causes issues downstream. However I believe that the change is correct and useful and it's downstream that should adjust to better align with the spec.
If push comes to shove, we'll temporarily revert. So let's do EAFP not LBYL and land this. Thanks @crusaderky
Well, the whole purpose of array-api-strict is to cause issues early during testing. |
Exactly. My only point is that if the patch proves too disruptive downstream, we can temporarily revert to give upstream time to adapt. Happened with |
In #135, binary operators vs. numpy.int64 became (correctly) disallowed.
Add a test for it.
Also test binary operators vs. numpy.float64 and numpy.complex128, which must be
allowed asrejected even if they are subclasses of float and complex respectively.The same treatment for functions is in a separate PR: #148.